Wednesday, April 17, 2013

SSIS - Replace Dynamic SQL With Variables

     I was recently introduced to a number of SSIS packages containing execute SQL and data flow tasks, all of which were using dynamic SQL.  After reviewing the tasks I found that many could, and should, be rewritten using an SSIS variable.  The focus of this post will be on replacing dynamic SQL in SSIS with variables and is not meant to address the good, bad, or ugly of dynamic SQL.  If you want to evaluate the pros and cons of dynamic SQL PLEASE review Erland Sommarskog article that will answer any question you may have on this topic here.

     Without diving into what can become a heated debate over the use of dynamic SQL I will just outline several reasons for replacing dynamic SQL with an SSIS variable.

1. First and foremost IT IS MUCH EASIER to troubleshoot and debug statements passed as variables than those using dynamic SQL

     a. Dynamic SQL statements are very difficult to troubleshoot as the statement is only fully visible at runtime

     b. Statements passed using SSIS variables are available during debugging and can displayed using techniques such as a script task message box or viewed in the Locals window during a breakpoint

2. SSIS statement variables provide all of the benefits of creating an ad-hoc dynamic query without the limitations of dynamic SQL

This post will focus on, and support, these benefits, as well as pointing out the limitations of SSIS variables.

     Let's first focus on why dynamic SQL or SSIS variables are required.  One of the most common reasons for using dynamic SQL is that the column(s) and/or table(s) need to be dynamically defined at runtime. T-SQL does not support using variables for these types of queries.  The below example illustrates a query that would require using dynamic SQL since in its current form would result in an error:
DECLARE @table VARCHAR(50) = 'Person.Person'
SELECT *
FROM @table;
GO
Results Msg 1087, Level 16, State 1, Line 5 Must declare the table variable "@table".

To successfully execute this statement dynamic SQL can be used as displayed below using both EXEC and sp_executesql:
DECLARE @table VARCHAR(50) = 'Person.Person'
DECLARE @cmd AS NVARCHAR(4000)
SET @cmd = 'SELECT * FROM ' + @table
EXEC (@cmd);
EXEC sp_executesql @cmd;
GO

The above sample shows an overly simplistic, but quite common, example of dynamic SQL.  SSIS provides the means of replacing dynamic SQL with variables that provide the same functionality, but are far easier to troubleshoot. 

     To demonstrate using variables for dynamic statements create a package and two variables: Statement  Data Type String

TableName Data Type:String Value:Person.Person

image

Click the ellipse button under the Expression column for the Statement variable and enter the following expression:

"SELECT FirstName, MiddleName, LastName FROM  " +  @[User::TableName]

image      Add a data flow task to the control flow tab and within the data flow add a OLEDB data source and configure the source data access mode to SQL command from variable:

image

With the statement defined and the data source configured to use it you need only add a destination to send the query results.

image

       This seems simple enough, and in fact a bit too easy.  The point of using a variable as a statement is to be able to dynamically create a query, which the above sample is somewhat lacking.  To better demonstrate a dynamically generated statement add an execute SQL task to the control flow pane and set the Result Set to Single row and use the query below for the SQLStatement value:

SELECT SCHEMA_NAME(schema_id) + '.' + name AS name
FROM sys.tables
WHERE SCHEMA_NAME(schema_id) = 'Person'
AND OBJECT_NAME(OBJECT_ID) = 'Person'

image

In order for the TableName variable to be assigned the value from the query configure the Result Set property to assign the output value:

imageAt run time the first execute SQL task will populate the value for the TableName variable which will in turn be used in the Statement variable that is used in the data source task.

image

     Using an SSIS variable as a statement provides the ability to dynamically generate a statement without using dynamic SQL and is far easier to troubleshoot.  Unlike dynamic SQL SSIS variable values can be viewed during debugging of a package allowing you to see the statement that will be used in either an execute SQL or data flow task:

imageTo keep true to my word I wil mention one limitation of using a variable as a statement is that there is a maximum of 4,000 characters in an expression, which can make it a bit trying when working with a rather large and complex dynamic statement.  Please keep in mind that this limitation is for expressions and not for string variables, which means that it is possible to create multiple string variables, assign their values in each expression, and then use a variable to concatenate all of the variables together.  The below illustrates where 4 different variables were created all defining an expression of up to 4,000 characters.  Finally a variable called FullStatement concatenates each one of the variables together. 

image

Kind of a pain and can definitely make using variables as a value a bit more complex, but it works all the same.

     The sample package referenced in this post can be downloaded from my site here.

Sunday, January 27, 2013

SQL Safety is Moving!!

     Over the past several months I have been able to return to blogging and the forums that I used to frequent.  I am thrilled to have the time to once again blog regularly and return to the SQL community. 

     I have received several emails asking that I provide my sample code, script, packages, etc.  I decided that it is time to move to  a site where I have more control over the look and feel of my site and provide my sample code for download.  My new site address is http://www.sqlsafety.com and over the next several days I will be posting here, as well as doing some customizations to the look and feel.

     I hope to see you all there soon.

Saturday, January 26, 2013

Where Are Your Sample Files?!?!

     I have received requests to provide my sample files for download from my posts.  I will be updating my previous posts to include links to all sample files and scripts and all new posts will include links to samples.  I will update the past posts by including the download link at the bottom of each page within the next week.

 

     Thanks for the feedback!

Friday, January 25, 2013

SSIS Precedence Constraints

     Precedence constraints provide a means of controlling the flow of a package based upon a tasks succeeding, failing, or completing, and offers the ability to evaluate a variable value with task completion to direct package path.  The most basic precedence constraint allows defining the flow of a package based upon task success, failure, or completion.

     The following example will provide a simplistic example of a precedence constraint used only the task completion status.  The package contains a variable called Constraint that is a string data type which will be used to capture each task status.  Again to simplify this example a sequence container containing three script tasks are added to to demonstrate success and failure completions.

image     From the diagram it is obvious that There is a primary script task in the sequence container that begins the package.  Based upon the Random Success Failure script tasks completion status will dictate whether the package flow will move onto the Success or Failure script task. 

     In order to demonstrate the execution path configure the Random Success Failure script task set the ReadWriteVariables of the task to use the Constraint variable

image      Configure the C# script to randomly define whether the task succeeds or fails with the following code:


            Random rnd
= new Random();
           
int value = rnd.Next(1, 3);
           
MessageBox.Show(value.ToString());
                
           
if (value == 1)
                
           
{
                Dts.Variables["Constraint"].Value
= "Success";
               
MessageBox.Show(Dts.Variables["Constraint"].Value.ToString());
               
Dts.TaskResult = (int)ScriptResults.Success;
           
}
               
else
           
{
                Dts.Variables["Constraint"].Value
= "Fail";
               
MessageBox.Show(Dts.Variables["Constraint"].Value.ToString());
               
Dts.TaskResult = (int)ScriptResults.Failure;
           
}

       }

     This code will dynamically assign a value between one and two that will, using if…else, define the task success or failure.  Configure each Success and Failure task to use the Constraint variable as a ReadOnly variable and add the message box Show method to display the Constraint variable value when either task executes.

image


MessageBox.Show
(Dts.Variables["Constraint"].Value.ToString());

     Executing the package will first display a message box showing the random value of 1 or 2

image

Then a message box will show the value assigned to the Constraint variable, Success if the random value is or False if 2.

image Finally a third message box is displayed from the Success or Failure script task again displaying the Constraint variable value.

imageimage       Precedence constraints based on task success, failure, or completion provides a means of execution path control, but is obviously limited.  Now let’s consider a more complicated execution path requirement.  To demonstrate this copy and paste the existing sequence container renaming the copy Expression and Constraint and connect the new container to the existing with the on success precedence constraint. 

image     With the new sequence container added consider that a new execution path is defined.  The sequence container, and scripts within, should only be run during the week and not on weekends.  This is a common requirement and can be accomplished using a precedence constraint that evaluates a variable. 

     In order to accomplish this type of execution path we must first create a variable that will be used to evaluate the day of the week.  Create a variable called WeekDay that is a data type integer and uses the following expression to define its value:
(DATEPART("dw", GETDATE()) == 1 ||  DATEPART("dw", GETDATE()) == 7)  ? false : true

     The above expression uses an if…else syntax that says if the day of the week is equal to 1, Sunday, OR the day of the week is equal to 7, Saturday, then the value is false, otherwise the value is true.  Refer to this post to look more at the if…else syntax in SSIS expressions and this post for more information on variables in SSIS.

     The current on success precedence constraint connecting our two sequence containers will need to be configured to meet the execution logic.  From within the Precedence Constraint Editor configure the constraint to use both expression and constraint and in the expression text box enter @WeekDay == True:

image      This says that the Expression and Constraint sequence container should only be executed when the Constraint Only sequence container completes successfully and the day of the week is not Saturday or Sunday.

     Finally add a message box to the Random Success Failure script task in the Constraint Only package that will display the value of the WeekDay variable:

MessageBox.Show("Is it a weekday? " + Dts.Variables["WeekDay"].Value.ToString());
             

     Executing the package results in the message boxes being displayed showing the values of the variables and also shows that if the WeekDay variable returns True then the Expression and Constraint sequence container will execute.

image      Obviously since I am working hard on a Saturday the Expression and Constraint sequence container will not execute.

image      We can now see that the precedence constraint evaluation worked and stopped despite the success of the first sequence container execution was stopped since the WeekDay variable evaluated to False.

     This post outlined the use of precedence constraints and how the execution path can be defined based upon a package, container, or tasks success, failure, or completion as well as using variables to dynamically define the execution path.  You can find documentation on MSDN that provides more information on configuring precedence constraints here.

     The sample package referenced in this post can be downloaded from my site here.

Thursday, January 24, 2013

SSIS DelayValidation Myth

     Over the past 4 days I have addressed what appears to be a common misconception regarding the DelayValidation property in SSIS.  The core of the misunderstanding is that by setting the DelayValidation property to true will improve performance since no pre-validation will be done on the container in which the property is set, but only runtime validation. 

     This is not the case at all.  The DelayValidation property does exactly what would be expected.  It stops pre-validation, but does not stop runtime validation.  When the default value of false is used in the DelayValidation property then the package, container, and or task will not be be validated upon execution of the package but rather upon runtime of the executable.  Regardless of the property setting validation will only occur once on each executable.

     To better demonstrate this I have two packages both of which have 35 data flow tasks all connected with OnSuccess precedent constraints.  The first package I have left each task with the default value of false for each tasks DelayValidation property.  By setting a break point in the first tasks Pre-Execute event and executing the package you can clearly see in the Progress tab that EVERY task, including the package, is validated.

image      The second package is identical except the DelayValidation property is set to true.  Again by setting a break point in the first tasks Pre-Execute event handler and executing you observe that only the package was validated.

image     A follow up question usually asked about package validation is if validation always occurs then why is there even a DelayValidation property on a package if it is going to validate once it is executed?  This can be an important property when a package contains an execute package task.

     By stepping through the package it is visible that each task is validated when the property is set to True, but ONLY during Pre-Execution.  In order to properly address this I have spent a great deal of time experimenting with the DelayValidation property to see of execution time differs between all tasks being set to True and False and have results have been mixed.  I somewhat expected this result since there is NO WAY around validation and time and resources for validating any executable is going to be affected by activity at the time of validation.

     So why use the DelayValidation property?  That question is easily answered.  Consider a complex package where staging tables are created in execute SQL tasks before being used in downstream data flow tasks.  During validation resolution is done to insure that the referenced objects exist.  If the task DelayValidation is set to false then the package execution immediately fails.  Since the referenced tables don’t exist in the data flow tasks, yet, validation fails at the task,  By changing the DelayValidation property to False on the appropriate tasks package execution will succeed, the objects will be created first in the execute SQL tasks, and upon runtime validation the data flow tasks succeed.

     The SSIS packages referenced in this post can be downloaded from my site here.

Wednesday, January 23, 2013

T-SQL Parameters in an SSIS Execute SQL Task

     SSIS provides several tasks that directly pass T-SQL queries directly to SQL Server.  There are numerous occasions in an SSIS package where when input needs to be passed into a T-SQL statement prior to the execution.  Several examples of this are creating dynamic statements and passing parameters to a stored procedure.  I covered creating dynamic statements using variables in SSIS in this post, so this post will focus on a bit more simplistic approach of passing parameters to execute SQL tasks.
     In order to prepare for the SSIS package pre-create the stored procedure that will be called in the AdventureWorks2012 database, the below code can be used: CREATE PROC usp_GetName @BusinessEntityID INT
AS
SELECT
FirstName,
      
MiddleName,
      
LastName

FROM Person.Person

WHERE BusinessEntityID = @BusinessEntityID;
     Create an SSIS package called T-SQLParameters and a create a package scoped variable called BusinessEntitytyID, leave the default data type of integer, but set the value to 10. 

image
Drag and drop an execute SQL task onto the control flow pane and rename it Retrieve Person Name.
     Configure the task’s SQLStatements property to pass the command EXEC usp_GetName ? and create a connection manager to an instance of SQL that has the sample database AdventureWorks2012 that has the stored procedure usp_GetName already created.
image
     The T-SQL statement is obviously is not syntactically correct based on T-SQL standards, the whole “?” thing should be a clue.  The “?” is a place holder for parameters in a statement, both input and output parameters are supported, in either execute SQL or data flow tasks.  The statement in the task is configured to execute a parameter in SSIS, but the task now needs to be configured to recognize the parameter being passed.  In the Parameter Mapping menu configure the Variable Name as BusinessEntityID, Direction as Input,  Data Type as Long, and the Parameter Name as @BusinessEntityID. 
image
The Parameter Name can either be the ordinal position in which the parameter appears, using a zero based index, in the task, input or output, or by the parameter name.  In this case there is only one parameter so the name could be replaced with “0”.
     Executing the package now will successfully execute the stored procedure usp_GetName passing the BusinessEntityID of 10 based on the valued of the associated parameter.  In order to make things more interesting let’s add an output parameter to the usp_GetName stored procedure and reconfigure the execute SQL task.  Alter the stored procedure using the below DDL statement:
ALTER PROC usp_GetName @BusinessEntityID INT, @Rows INT OUTPUT AS
SELECT
FirstName,
     
MiddleName,
     
LastName FROM Person.Person WHERE BusinessEntityID = @BusinessEntityID SELECT @Rows = @@ROWCOUNT;

     Add a variable to the package called RowCount and leave the defaults, this will be used to capture the output parameter RowCount fo the stored procedure.  The Retrieve Person Name execute SQL task needs to be modified to successfully execute.  Modify the SQLStatement property to now show:
DECLARE @RowCount INT
EXEC usp_GetName ?,  @RowCount = @RowCount OUTPUT
SELECT ? = @RowCount
The Parameter Mapping needs to be changed to take into account the OUPUT parameter so modify this by adding the RowCount variable and setting its direction as OUPUT and ordinal position to 1 in the Parameter Name property:
image     In order to verify that the output value of the query is being properly assigned drag and drop a script task onto the control flow pane and configure it to accept the RowCount as a ReadOnlyVariable.
image In the C# script body add a message box show method to display the value of the RowCount varaible value:
MessageBox.Show(Dts.Variables["RowCount"].Value.ToString());
     Executing the package will not display a message box as shown below:
image      This sample focused on an execute SQL task, but a similar technique can be used for a data flow task source using a variable.  This technique is outlined on my post SSIS - Replace Dynamic SQL With Variables.

     The sample package for this post can be downloaded from my site here.

Tuesday, January 22, 2013

SSIS Custom Logging

     In a previous post I outlined how to log and record errors, which uses the underlying log providers of SSIS.  This provides an excellent means to capture information of a package at runtime, but it does have limitations, MSDN documents logging in SSIS here.  Log providers can be used to capture specific information:

  1. Computer
  2. Operator
  3. SourceName
  4. SourceID
  5. ExecutionID
  6. MessageText
  7. DataBytes
  8. StartTime
  9. EndTime
  10. DataCode

upon the firing of the following events:

  1. OnError
  2. OnExecStatusChanged
  3. OnInformation
  4. OnPostExecute
  5. OnPostValidate
  6. OnPreExecute
  7. OnPreValidate
  8. OnProgress
  9. OnQueryCancel
  10. OnTaskFailed
  11. OnVariableValueChanged
  12. OnWarning
  13. PipelineComponentTime
  14. Diagnostic

     I am in no means trying to diminish the usefulness of native SSIS logging, but there are most definitely several areas that fall through the cracks.  The most immediate example that comes to mind is recording how many rows are affected in a data flow task or execute SQL task.  It is a common requirement to need maintain the execution time of SSIS tasks, which can easily be done using native log providers, as well as the number of affected rows for appropriate tasks.  This information can be invaluable in order to maintain a baseline of package performance.

     Custom logging first requires that a destination is created to store the log information and in this example I’m going to use a table, which is created with the below DDL script:


USE AdventureWorks2012;
GO

CREATE TABLE CustomLogging(
PackageGUID    CHAR(38),
ExecGUID       CHAR(38),
TaskName       VARCHAR(25),
RowsAffected   INT,

LogDate       DATETIME);
GO

     With the logging table created it’s now time to configure your package, which we’ll start by creating a variable called RowCount:

image      I’ll use a simple, albeit somewhat pointless, data flow task to demonstrate populating the variable and logging its information.  The below illustration shows the OLEDB data source of my Pointless data flow task:

image      In order to be able to capture the rows affected from this source to the destination add a row count task and configure it to use the RowCount variable:

image Now all that is needed is a destination, in my case I will just use a flat file destination.

image

     I want to capture the affected rows immediately on completion of the data flow task so I will use the OnPostExecute event handler to log the information to my CustomLog table using an execute SQL task.  The execute SQL task will use my OLEDB data source to AdventureWorks2012 and will us the query below:

image      Anyone familiar with T-SQL will stare questioningly at this INSERT statement pondering how this is exactly supposed to work.  The magic of this is that the “?”’s act as placeholders for the variables that will be configured as parameters, to get more information on using variables as parameters you can view this post.  The Parameter Mapping pane of the execute SQL task provides the ability to map the values from the system and user variables to the ordinal position of each one of the question marks.  The variables don’t have to necessarily be created in the order in which the values appear in the query, but the Parameter Name must specify the 0 based index value of the ordinal position in which the value is to be placed.  In this example I am going to use a mixture of system and user variables that will relate to the table columns:

System::PackageID   = PackageGUID

System::ExecutionID = ExecutionGUID

System::SourceName = TaskName

User::RowCount = RowCount

System::ContainerStartTime = LogDate

The diagram below shows how these values should be configured.

image      Once the execute SQL task is complete you are ready to run the package.  After the package is complete you can query the CustomLogging table and insure that the results were properly logged.

SELECT *
FROM CustomLogging;
GO

Results {0270D6BB-FCC8-43C4-BBF4-CAB5DE031350}    {9A8AE9FA-F2FA-41D2-B640-335D787D76B4}    PointlessFlow    19972    1900-01-01 00:00:00.000

     This above example displayed how to configure custom logging for data flows to capture information that is not included in SSIS native log providers.  Obviously there may be times when execute SQL tasks are used rather than a data flow and there is still the ability to record affected rows, but rather than a row count an output parameter will have to be used.

     The below illustrates a query in an execute SQL task that can be used to assign the @@ROWCOUNT to the RowCount variable.

 image The only thing left to do is configure the RowCount variable as an output parameter in the execute SQL task

image      To log the results follow the same steps above configuring the OnPostExecute event handler of the execute SQL task.

     The sample package for this post can be downloaded from my site here.