Archive

Archive for the ‘SSIS’ Category

SSIS :- Removing Datestamp From File Name When Moving From One Location to Other

April 12, 2015 1 comment

This question is comes from MSDN, link is here.

User Comment:-

Is there any way to move all the file from one location to another and remove the datepart from filename.

for eg:- if file name is abc_20150411.xls change to abc.xls. if file name does not contain date part then ignore it.

Lets create Two Folders Source and Destination in file system, after that come to BIDS or SSDT and create one New SSIS package.

Create 5 Variables in SSIS package like given below:-

Variables

Lets drag one foreach container into the package and set the properties like below:-

ForEachProperty

Inside the ForEach Container create one Script task

Inside Script Task, write below code:-

Imports System.Text.RegularExpressions

Public Sub Main()

Dts.Variables(“SourceFullPath”).Value = Dts.Variables(“SourceFolder”).Value.ToString() + Dts.Variables(“SourceFile”).Value.ToString()

Dts.Variables(“DestinationFullPath”).Value = Dts.Variables(“DestinationFolder”).Value.ToString() + Regex.Replace(Dts.Variables(“SourceFile”).Value.ToString(), “[_0-9]”, “”).ToString()

Dts.TaskResult = ScriptResults.Success

End Sub

Then  below Script task drag File Task

set properties of File Task as shown in below screen:-

FileTaskProperties

as we need to rename the file so , Select Operations as “Rename File”

Lets copy the files in Source folder , it will look like below:-

SourceFolder

Run the package, execution will look like below:-

ExecutionofPackage

lets check the destination folder and source folder to check the result

Result

In result the filenames are modified and no more date is added in the filename.

Thanks
Prasad

Learn Different BI Tools
Subscribe my YouTube Channel : BI Tools

Categories: MSDN, SSIS Tags: ,

Dynamic Connection String Settings in SSIS to Connect SSAS Cube, Should not have Invalid values

February 13, 2015 Leave a comment

Recently we have found one strange behavior while Processing SSAS cube using SSIS -Analysis Services Processing Task.

We have Dynamic connection string to connect the SSAS cube in SSIS package. Cube Database name we have given dynamically.

If Cube Database name is invalid one in the variable, and we setting this variable before connecting Cube to process the cube, still it throws the error.

However when we put the variable blank (no invalid name). Then it executes successfully.

If we connect the SQL database with above scenario we will not get any error , Only for SSAS its throws error.

Let’s check what we are mentioned above.

Create One SSIS package, create two variables one for SSAS cube database and another for SQL Database as shown below:-

Variables

Note Above two variables we have given invalid value.

Lets create Two Execute SQL task which will set the correct values of DB Name to respective variables.

After below those two Execute SQL task we will test the connection string as shown in below screen.

Structure To Test ConnectionString

Lets execute the package with wrong Names of DB , it will show below screen.

SSASFalied

So SSAS got failed, if you notice we have set the correct value before SSAS process task execute.

Let’s try with blank values.

SSAS with Blank Value

Now with blank values its working.

Thanks
Prasad

Learn Different BI Tools
Subscribe my YouTube Channel : BI Tools

Categories: SSIS Tags:

SSIS TIP #2: Table Variable Output with NOCOUNT ON

January 22, 2015 Leave a comment

When we using TABLE VARIABLE as source in the SSIS, we must set NOCOUNT ON.

In preview of the rows it can show the data rows, however when we execute it will not give any data.

Let’s check in SSIS.

Create one DFT in SSIS package and use the OLE DB Source and write down the below query.

 DECLARE @abc TABLE (Empid INT ,Name VARCHAR(50))

INSERT INTO @abc VALUES(1,’prasad’)

SELECT * FROM @abc

In below screen shows the Source query in OLEDB Source task.

OLE DB SOURCE

Try to Preview the rows by clicking the Preview… Button. It will show data as shown in the below screen.

PreviewSource

 

 

So we are getting data in preview. Lets run the package before that add the Union all component and data viewer to see the data flow between Source to destination.

NoDataRows

In above screen we can see there is no row passed between source to destination.

Lets add NOCOUNT ON in the source query as given below:-

SET NOCOUNT ON

DECLARE @abc TABLE (Empid INT ,Name VARCHAR(50))

INSERT INTO @abc VALUES(1,’prasad’)

SELECT * FROM @abc

SET NOCOUNT OFF

After above change, once again execute the package. Below screen shows the execution result

DataRows

Hence After using NOCOUNT ON we can see data in data viewer and number of rows in front of precedence constraint.

Thanks
Prasad

Learn Different BI Tools
Subscribe my YouTube Channel : BI Tools

Categories: SSIS Tags: ,

SSIS – Excel Tip #1:- SSIS Guessing Data Type of Excel Source Column

January 22, 2015 Leave a comment

First we check what the meaning of Guessing Data Type is. Later we try to resolve the issue.

Let’s say we have two excel files with data as given below:-

File: – EmployeeFile_v1.xls

EmployeeID Name Salary
1000 Rahul 6000
1000 Rahul 6000
1000 Rahul 6000
1000 Rahul 6000
1000 Rahul 6000
1000 Rahul 6000
1000 Rahul 6000
1000 Rahul 6000
abc Dilip 8000
1000 Rahul 6000
1000 Rahul 6000

 

File :- EmployeeFile_v2

EmployeeID Name Salary
Abc Dilip 8000
Abc Dilip 8000
Abc Dilip 8000
Abc Dilip 8000
Abc Dilip 8000
Abc Dilip 8000
1000 Rahul 6000

 

Let’s Create one SSIS package where we use Data Flow task, inside it Excel Source and using Data Viewer control we can check what data is giving with different files as discussed above.

Excel Different Sources

Above Screen we have two different Sources.

Source 1:- EmployeeFile_v1.xls

Source 2:- EmployeeFile_v2.xls

Using Data Viewer Control we will see what kind of data we are getting .

Below screen shows the data output of both the sources.

Excel Data Output

In above screen we can see the first file i.e. EmployeeFile_v1.xls has Dilip with EmployeeID =”Abc”.

However the output is converted to NULL in highlighted box above.

SSIS excel source guess the data type of the EmployeeID as Integer in first file as it has most of the data Integer. When String data came it can’t fit data into Integer data type so it stores as null.

However in second file also it guesses the data type as string, but numbers can be put into string.

To prevent Excel on TypeGuessing.

We need to do two steps as below:-

    • Connection String in Excel Connection manager should be written as Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\EmployeeFile_v1.xls;Extended Properties=”EXCEL 8.0;HDR=YES;IMEX=1″;

IMEX

    =1 should be added at the end.

  • We need to modify in the registry, where there is a parameter

Path to reach that parameter is

HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel

By default TypeGuessRows has value 8, modify to 0.

After above steps just execute the package.

Excel Data Output Without TypeGuess

Now in above result we are not getting null in string data , as it is converted into string data type.

Thanks
Prasad

Learn Different BI Tools
Subscribe my YouTube Channel : BI Tools

Categories: SSIS Tags: ,

Data cleansing using Fuzzy Lookup -SSIS

October 8, 2012 Leave a comment

Fuzzy lookup transformation is one of the useful transformation used in data cleansing purpose.

In previous post we seen how to remove duplicate rows in source data using fuzzy grouping transformation. This Fuzzy Lookup transformation used identify the similar data in columns. For example User may enter data as spelling mistake like user name as “Robin” or “Robins” for the same person. So using reference data we can do fuzzy lookup to get the appropriate name.

Let see how we can use Fuzzy Lookup:-

We may have data which has spell is not consistent like

Fruits Month Cost
Mango Jan 10
Mangoees March 15
Mangoes Feb 20

 

SELECT ‘Mango’ Fruits ,’Jan’ Month,   10 Cost

UNION

SELECT ‘Mangoes’,  ‘Feb’, 20

UNION

SELECT ‘Mangoees’, ‘March’, 15

 

Above data when reference with look up table in ETL process, let say we have Fruit Dimension

DimFruit

FruitId FruitName
101 Mangoes
102 Apples
103 Grapes

 

Let Say we will lookup with the data in Dimension before any cleanup

Open SSIS designer and create one sample package and copy the source query into the OLE DB Source task and then lookup with the Fruit Dimension data.

1SourceQuery

 

Lets drag the normal lookup with having DIMFruit Query as below:-

SELECT 101 FruitId, ‘Mangoes’FruitName

UNION

SELECT 102 FruitId, ‘Apples’FruitName

UNION

SELECT 103 FruitId, ‘Grapes’FruitName

 

2LookupQuery

Go to the Column section in Transformation Editor as shown below screen:-

3.LookupColumn

 

In above screen we lookup on names and getting the value of FruitId which want to place in the Fact table.

4LookupMatchOutput

 

Add one unionall component with dataviewer control to get to know how many rows are matched.

Lets run the package, it will show as in below screen:-

5.Result

 

So we got 1 record as matched, however we want 3 records, but data is not correct due to miss spell.

Let’s use the FuzzyLookup:-

For Fuzzy Lookup we need to create DimFruit table physically as it will not take as query as we did in above lookup.

CREATE TABLE dbo.DimFruit(FruitId INT ,FruitName VARCHAR(100))

insert into dbo.DimFruit

values( 101 , ‘Mangoes’)

,( 102 , ‘Apples’)

,( 103 , ‘Grapes’)

Fuzzy lookup ask for table to reference and it will store as index for better performance during lookup.

6.FuzzylookupRefrence2

 

Go to the Column tab and set which source column should map with the reference column.

7.ColumnsRefrenced

 

Go to the advance tab and set the Similarity threshold to .50 , it depends upon how much similarity we want to check.

8.FuzzylookupRefrence

 

Lets add unionall and data viewer control to know the records.

9.FuzzyLookupResult

 

In above screen we can see we get the Fruitid for all the source data.

Thanks
Prasad

Learn Different BI Tools
Subscribe my YouTube Channel : BI Tools

Categories: SSIS Tags: ,

Pivot and UnPivot Transformations in SSIS

October 2, 2012 Leave a comment

Pivot and UnPivot Transformations are available in SSIS Data Flow transformations.

We have Pivot and UnPivot functionality in SQL Server database as well. The same functionality we can get from the SSIS designer too using above said transformations.

Let see the configurations and functionality of both transformations in one package.

Lets prepare some data as shown below:-

Student Name Subject Marks
Prasad Maths 80
Prasad Physics 75
Prasad Chemistry 85
Prasad Biology 77
Dilip Maths 75
Dilip Physics 85
Dilip Chemistry 77
Dilip Biology 80
Karthik Maths 90
Karthik Physics 80
Karthik Chemistry 85
Karthik Biology 78

Above Data should be Pivot to as shown below:-

Student Name Maths Physics Chemistry Biology
Prasad 80 75 85 77
Dilip 75 85 77 80
Karthik 90 80 85 78

after Pivot will do unpivot and get the same data as before doing Pivot.

Lets see in SSIS, how to do it.

Prepare data as shown in above example:-

SELECT  ‘Prasad’ AS StudentName ,‘Maths’ AS Subject,80 Marks

UNION

SELECT  ‘Prasad’,‘Physics’,75

UNION

SELECT  ‘Prasad’,‘Chemistry’,85

UNION

SELECT  ‘Prasad’,‘Biology’,77

UNION

SELECT ‘Dilip’,‘Maths’,75

UNION

SELECT ‘Dilip’,‘Physics’,85

UNION

SELECT ‘Dilip’,‘Chemistry’,77

UNION

SELECT ‘Dilip’,‘Biology’,80

UNION

SELECT ‘Karthik’,‘Maths’,90

UNION

SELECT ‘Karthik’,‘Physics’,80

UNION

SELECT ‘Karthik’,‘Chemistry’,85

UNION

SELECT ‘Karthik’,‘Biology’,78

Open the SSIS solution in BIDS. Create new package and add Data Flow Task (DFT) into the designer.
Double click on DFT and go inside of the DFT , add Ole DB Source component.

In OLE DB Source component copy paste the above code which will become our source data which needs to be pivoted.
Above step is shown in below screen shot.

Source Query

After this drag Pivot component from toolbox to designer. usually its in left side of the designer.

Pivot Component

Open the Pivot Editor by double click , go to the Input Columns and set the Avilable columns as shown below screenshot:-

PivotEditor

After above screen go to the Input and Output Properties screen and expand input Columns, it will look like below screen:-
Input Output Properties

After clicking on the Input column individually , we can see the properties of each column.
We have to set the PivotUsage property for each column as shown in below screen:-

Pivot Usage Property

StudentName Column should have 1

Subject Column should have 2

Marks Column should have 3

You may wonder what is these values 1,2,3 signify.

1 = column values become the rows of the pivot .

2= column values become the column names of the pivot , Different Subjects as value will be column name

3= column values that are pivoted in the pivot , Above Columns will have value of this column.

One more value is there 0 (Zero) which is used when we dont want that column to take part in pivot (pass through).

After input column properties are set, next step to set the Pivot Default output properties.

before Output column properties, we need to add columns as shown in below screen:-

PivotOutputAddColumn

Lets create column First Column StudentName and set the two properites, If these properties are not correct it will not alow to save also.
Property:- PivotKeyValue and SourceColumn as shown in below screen.

OutputColumnProperty

PivotKeyValue as Column Name which has value.

SourceColumn is LineageID of the Input Column which is 17 here, Lets verify by seeing LineageID of input Column which is shown in below screen:-

LineageID InputColumn

We need to notedown of all input columns Lineage ID which we  using in Pivot Transformation.

Next Column is like pivoted Column as shown in below screenshot:-

PivotedOutputColumnProperty

PivotKeyValue as the value.

SourceColumn which has the value to be shown under this pivoted column, so this Marks to be shown under Maths subject.

So Marks column LineageID should be entered, in above case its should be 23.

Like that rest columns to be created.

Add one Unionall component and then add precidence constraint and add dataviewr control to see the data.

Execute the package and it will show data as shown below screen:-

PivotResult

Lets remove the UnionAll and place Unpivot component as our data got pivoted we can use it as source and lets unpivot it using Unpivot transformation.
After this again we can get the actual data which is we used in source.

Double Click on unpivot transformation  it will open the editor as shown below:-
Unpivoteditor

In Student Column we dont want any change so we checked the Pass Through Checkbox.

Pivoted Column only we taken and give the Destincation column Subject, So that all values comes in Subejct column.

Pivot Key value column name is the column name which value will be come in front of Subjects in Marks Column.

Now we are done with Unpivot.

lets add dataviwer control to see the data.

PivotUnpivotResult

We can see Source data then Pivoted data then lastly Unpivoted data in above screen.

More technical stuff can be found in http://msbitips.com

Thanks
Prasad

Learn Different BI Tools
Subscribe my YouTube Channel : BI Tools

Categories: SSIS Tags: ,

SSIS Remove Duplicate Rows Using Fuzzy Grouping (SSIS Transformations)

September 27, 2012 4 comments

Source data may have the duplicate rows which needs to be removed as part of data cleansing task.

Fuzzy Grouping is one of the transformation in Data Flow Transformations which can be used to group the similar rows in the source stream line. This transformation performs data cleaning tasks by identifying rows of data that are likely to be duplicate.

Lets generate some of duplicate records as shown in below screen:-

Source Query

Create one DataFlow task inside which Drag OLE DB Source component and write the query shown above which will become as source data which has duplicate records.

OLEDBSource

Drag the Fuzzy Group component and open the editor.

Go to the Columns tab, and check the checkbox agist the column which you want to analyze of similarity.

FuzzyGroupParticipatingColumn

after that go to the advanced tab. Here we can define Similarity threshold.

SimilarityThreshold

Important values in aboe screenshot are _key_in and _key_out in our example as we are removing duplicate records.

Lets for temporary use drag the Union All component and add dataviewer control to look for the data after passing through FuzzyGroup Transformation.
In below screen, we can see the result.

FuzzyGroupOutput

So we can see in above screen , Key_in and Key_out has same value for unique rows. Hence we can take only those records which have Key_in=Key_out.

Remove the Union all and add conditional split transformation to get the unique rows as shown in below screen:-

ConditionalSplit

after that to check that drag Unionall component and add data viewer and run the task.

After running the data flow task it will show below screen:-

FinalResult

Hence in above result we can see Unique rows.

More technical stuff can be found in http://msbitips.com

Thanks
Prasad

Learn Different BI Tools
Subscribe my YouTube Channel : BI Tools

Categories: SSIS Tags: ,