Posts

Converting One-to-Many Relationships to One-to-One in Data Warehousing

In a transactional database, one-to-many relationships are very common. For example, one product may belong to multiple categories, and one person may have multiple addresses. In data warehousing, however, we often need only a one-to-one relationship for data analytics. Here are some common ways to convert a one-to-many relationship in the source table. 1. Add Multiple Columns One way to convert a one-to-many relationship to a one-to-one relationship is by adding more columns to the target table. This method is sometimes called “flattening.” For example, if a customer has up to three phone numbers and is represented in a source table CustomerPhone as follows: CustomerPhoneID CustomerID Phone 1 123 555-1111 2 123 555-2222 3 123 555-3333 In the target table, we can have three columns for Phone: Phone1, Phone2 and Phone3.  To "flatten" the so...

Enhancing SSIS Performance for Large Tables

In an SSIS dataflow, when dealing with a large target table (with millions of records or more), performance can become a consideration. Here are some ideas to significantly improve performance: 1. Reconfigure SSIS Settings In the "Properties" of the data flow: DefaultBufferMaxRows: Increase this number to 10-100 times the original value (somewhere between 100,000 and 1,000,000) and observe which value provides the best performance. DefaultBufferSize: Increase this number to 10-100 times the original value (somewhere between 104,857,600 and 1,048,576,000) and observe which value provides the best performance. In the "OLE DB Destination Editor": Uncheck "Check constraints" Change the value of "Maximum insert commit size" to 0 Please note that the effectiveness of these setting changes may vary from server to server, so some testing and adjustment are necessary. 2. Drop All Indexes If any of the large target tables have indexes (including primary ...

Advanced SCD Type 2 Implementation in SSIS

In SSIS, the Slowly Changing Dimension (SCD) component is a useful tool for transforming dimension tables with SCD Type 1 and Type 2 columns. However, its default functionalities are limited. For instance, it requires choosing between an 'active' column or start date and end date to identify current and expired records. Additionally, it assumes that a NULL value in the end date indicates an active record.

Ingesting Files with Different Metadata Automatically in SSIS

When ingesting multiple files in a folder using SSIS, a Foreach Loop task is commonly used with a data flow inside the loop. However, this approach assumes that all the files have the same columns, allowing them to be ingested into a single table. Unfortunately, files that need to be ingested often have different metadata, meaning they contain different columns in different files. One possible solution is to create a separate package (data flow) for each type of destination table and use the Execute Package task within the loop to call each package ( as described in this article ). However, this approach requires creating multiple packages and workflows, mapping all the columns in each data flow, which can be cumbersome. A more desirable approach would be preferred.

Unzipping and Archiving Files in SSIS

This article explains how to unzip a zip file containing text files, ingest the files into database tables, and then archive the files in a designated folder using SSIS. The solution assumes that the zip file contains a subfolder with the same name as the zip file, and all the files are located within that subfolder.

Passing Parameters from Parent to Child Packages in SSIS

This document explains how to pass parameters from a parent package to a child package in SSIS. The scenario involves ingesting files from a folder into relevant database tables, with each file requiring a separate package. In each package, a data flow is created to move data from the file to the database table. The file name is used as the package name and database table name for automation purposes. Additionally, a main_control package is created to loop through each file in the folder and call the respective child package from the loop.