Ware House Solutions

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Sunday, 29 January 2012

Dynamic SSIS package for Data Profiling

Posted on 05:17 by Unknown
The data cleansing element of a data warehousing projects can take a long time depending on the quality of the source data. You need to know the extent of the data quality problems early in the project. The data profiling task in SSIS is a good starting point assuming you have access to the source data and know which tables within the source are likely to be required.

I wanted a dynamic data profiling template that I could configure using variables and schedule during  a maintenance window to collect profiling data for specified tables within one or more databases. Looking into the problem I quickly found two good references.

The first was Jamie Thompsons blog from back in March/April 2008 - I'm not planning on moving to SQL Server 2012 any time soon so 2008 and 2008 R2 references are fine for now. The 10 articles are an excellent introduction to the data profiling task and interpreting the output.

But the most relevant article I found was by John Welch who seemed to have the same idea as me to create a re-useable data profiling package.

Data profiling template originally from John Welch.


It's a simple package and doesn't do everything I wanted. Specifically, it's not able to loop through multiple databases or limit to specified tables only. Also, I try to avoid direct input SQL statements in Execute SQL Tasks as I find administering SSIS packages with code buried in tasks to be less efficient than using parameterised stored procedures.

However, being pragmatic about things, it's not a package I'll run every day and it'll run in environments where I have no access to create an Admin database for DBA scripts and procedures, so I'll accept it as is. And I can schedule a SQL Admin job with multiple steps each calling the package with different configuration parameters to achieve my original goal.

You can download the source from the original blog here or with some minor amendments from me here.
Read More
Posted in Data Cleansing, Data Profiling, ETL, SSIS | No comments
Newer Posts Home
Subscribe to: Posts (Atom)

Popular Posts

  • SQL Bad Practice - Hardcoding
    Legacy systems present design challenges different to problems faced during greenfield development. One mistake I've seen time and again...
  • Data Warehousing Resources List
    Listed here are the best resources I've found for learning about data warehousing, useful techniques and methodologies. In no particular...
  • Big Data in the Real World - Part 2 of 2
    Part 1 of this series described the first 2 stages in the processing of clickstream data to allow sales conversion analysis. This post desc...
  • A 7-Zip Archiving Task for my SSIS Package Template
    I use an SSIS package template to speed up project delivery. 7-zip is my archiving tool of choice because it provides encryption and passwor...
  • Book Review - The Checklist Manifesto
    Periodically you read books which change your habits and make you more effective. The Checklist Manifesto is one such book. Authors who can ...
  • Pragmatic Principals - Agile
    The Data Warehouse exists to provide Business Intelligence. Its users will not always start with the end in mind. They will browse, discover...
  • Big Data in the Real World - Part 1 of 2
    There is so much talk of Big Data nowadays, mostly from sales people but also from technologists. Karen Lopez aka DataChick took issue at ...
  • Refactoring Legacy Databases
    Know your enemy - It might not always seem so, but legacy code was written with a process in mind. Determining what the code was originally ...
  • Building an Auditing Process for SSIS - Part 1
    An audit schema published in the excellent book SSIS Problem– Design – Solution allows the efficient capture of SSIS progress and errors in...
  • Pragmatic Principals - Focus
    The DW is a business system not a techie playground. Use Business Analyst's to focus all development on revenue generating (or saving) w...

Categories

  • 7-Zip
  • Architecture
  • Auditing
  • Bad Practice
  • Big Data
  • Book Review
  • Collations
  • Data Cleansing
  • Data Profiling
  • Data Warehouse
  • DBA
  • Design
  • Disaster Recovery
  • Error Handling
  • ETL
  • OLTP
  • Package Template
  • Principals
  • Professional Skills
  • Project Management
  • Quote
  • Refactoring
  • Replication
  • Resources
  • sFTP
  • SQL Agent
  • SSIS
  • T-SQL
  • WinSCP

Blog Archive

  • ►  2013 (8)
    • ►  September (1)
    • ►  July (1)
    • ►  May (2)
    • ►  March (2)
    • ►  February (1)
    • ►  January (1)
  • ▼  2012 (15)
    • ►  December (1)
    • ►  September (1)
    • ►  August (1)
    • ►  July (3)
    • ►  June (2)
    • ►  April (1)
    • ►  March (2)
    • ►  February (3)
    • ▼  January (1)
      • Dynamic SSIS package for Data Profiling
Powered by Blogger.

About Me

Unknown
View my complete profile