Monday, September 28, 2015

New management role for SSIS 2016

SSIS 2016 management

Working in regulated environments presented some challenges with the SQL Server Integration Services, SSIS, project deployment model which was introduced with the 2012 release of SQL Server. Specifically, there was no way to use the native reporting and dashboard features of the Integration Services Catalog without being a system administrator or having the ability to administer the SSISDB catalog. That's a clear violation of the separation/segregation of duties polices for the major compliance regulations.

The upcoming release of SQL Server 2016 provides a new built-in role, ssis_logreader that should address the need to allow support staff the ability to support SSIS without violating separation of duties requirement. They will be able to view the history for a package - its average, minimum and maximum durations along with run frequencies as well as delve into any errors associated to those runs. All of this from the comfort of the native reports built within the SQL Server Management Studio. No more will they need to write custom queries and master the object model.

We see this as an excellent step forward in the maturation of the administrative experience for SSIS.

Thursday, September 3, 2015

SSIS 2016 Incremental Package Deployment

SSIS 2016 Incremental Package Deployment

The community technology preview 2.3 for SQL Server 2016 is the first in the series of CTPs to deliver Business Intelligence functionality. Specific to SSIS, this release delivers support for the oData v4 protocol, Error Column, custom logging levels and the topic for this post: incremental package deployment.

Project Deployment Model

The classic problem with SSIS projects is that there was nothing that linked the packages to the business problem they were designed to solve once development was complete. To get the sales data loaded, we need to get both the header data as well as the detail data populated or we have an incomplete order. The 2012 release of Integration Services sought to address that by delivering a new deployment model: the project deployment model. This allowed solutions to be compiled into a deployable unit, the ispac, which ensures that all the pieces that were developed are delivered.

An aspect of this new deployable unit was that redeployments required the complete ispac to be reinstalled. One could not just deploy an updated version of the order detail processing package by itself. The updated package had to be compiled into an ispac and the new ispac replaced the existing one. For organizations that struggle with change control, this can appear to be a much larger change than it truly is.

Incremental package deployment

The upcoming 2016 release of SQL Server provides a path for providing incremental updates to an existing SSIS project. Individual packages can now be deployed in a piecemeal fashion into a project. This can greatly simplify a change control request as only the packages that have modified will be deployed into the project.

The UI defaults to "Project Deployment" which consumes an ispac file, as it always has.

Choosing "Package Deployment" allows for the selection of a folder that contains the updated SSIS package(s).

Further reading

Wednesday, March 11, 2015

Temp Tables vs Table Variables -- A Classic Just Became Clearer

Temp Tables vs Table Variables 



If you Google “Temp Tables vs Table Variables” you will come up with a myriad of results all of which contain excellent information.   Some examples are:
SQL Server Planet – gives a fantastic matrix of the various things that can be accomplished through each option
Stackoverflow - Rory and Stackoverflow - Martin Smith – each of which gives you insight into the different advantages and disadvantages of each.  Things like scoping, transactional behavior, indexing, recompilation, locking and statistics.   This is a tremendous amount of information and can really help you learn the internals of the two options.  
Code Project – gives a nice little run through of the performance of each type with time to complete for the varying record set lengths
Pretty much anything that you could want to know about temp tables and table variables you can find in one of these links.  
What I want to discuss here is why it makes such a big difference on these larger data sets as each of the above links clearly states is where you do not want to use table variables.    So below I have a very simple set of queries that will demonstrate what I see as the biggest issue with larger datasets being used with table variables. 
First I will create my table variable and my temp table with the same dataset.
Use AdventureWorks2012;
GO
Declare @tbl table (SalesOrderID int
                              , CarrierTrackingNumber varchar(40)
                              , OrderQty int
                              , ProductId int
                              , UnitPrice float);

insert into @tbl
select SalesOrderID
, CarrierTrackingNumber
, OrderQty
, ProductID
, UnitPrice
from [Sales].[SalesOrderDetail];

if object_id('tempdb..#t'is not null
      drop table #t;
Create table #t
      (SalesOrderID int
      , CarrierTrackingNumber varchar(40)
      , OrderQty int
      , ProductId int
      , UnitPrice float);

insert into #t
select SalesOrderID
, CarrierTrackingNumber
, OrderQty
, ProductID
, UnitPrice
from [Sales].[SalesOrderDetail];


And from there I will select out the records, doing a simple join over to a second table that has a direct reference within the temp table or table variable.

Select t.SalesOrderID, soh.AccountNumber, soh.BillToAddressID, soh.OrderDate
from @tbl t
join Sales.SalesOrderHeader  soh on t.SalesOrderID = soh.SalesOrderID;

Select t.SalesOrderID, soh.AccountNumber, soh.BillToAddressID, soh.OrderDate
from #t t
join Sales.SalesOrderHeader  soh on t.SalesOrderID = soh.SalesOrderID;

Using SQL Sentry Plan Explorer we can see that first off the Estimated cost for the Select against the table variable is less than that of the select against the Temp table.  In my case, this is against a table of 120,000 rows.




Don’t listen to that, because it is not always accurate.  It is just an estimate even if you have the actual execution plan.
So in looking at the Query Plan’s we see that for the table variable



This looks like a pretty good plan, right?   Clustered index seek on the real table, nested loops to bring it together…nothing alarming about this at first glance.
Heck, compare it to the plan for the temp table


And it makes it almost seem better.  This has a table scan against the real table, a hash match and it even has that darn warning (in this case that is just for a missing index suggestion but at first glance you might think this is worse).
HOWEVER, this is actually the problem.   The estimated cost makes it seem better, the plan even kind of looks better at initial glance but the reality of it is that this is a  MUCH WORSE plan.    And why does it do this?



Because that table variable  ALWAYS has an estimated row count of 1 row if you do not put optimizer hints on the query.  And because of that, the optimizer will assume that doing index seeks instead of scans is a much better approach. In a case like this one where we have over 100,000 rows, that is most assuredly NOT the case.
That Index seek with the nested loops to bring it together ends up doing over 360,000 reads in total, where the table scan with the hash match ends and the “warning” only does 1,246.



We can fairly easily remedy this situation by putting a simple Option (Recompile) at the end of our query that accesses that table variable as you can see below.   And when that is done, the query plan that the optimizer creates ends up being the exact same as that of the Temp Table. 
Select t.SalesOrderID, soh.AccountNumber, soh.BillToAddressID, soh.OrderDate
from @tbl t
join Sales.SalesOrderHeader  soh on t.SalesOrderID = soh.SalesOrderID
option (recompile);
This is a good solution but it is one that comes at the expense of expecting everyone who writes these queries to know this fact and how to get around it.  There are apparently other ways that can make a Table Variable do essentially the same thing as Option (Recompile) as is talked about at the bottom of the SQL Server Central article Mythbusting - Table variables have no statistics by the writer and Brent Ozar, but I think in the end we have to choose what the odds of us and our teams remembering to either put that optimizer hint on a query or actually know what the various outlier scenarios are that cause table variables to work in this way.

Table variables have their place but that estimated number of rows being equal to 1 without additional help to the optimizer can definitely have an impact on the types of query plans the optimizer will choose.    So when you are making that decision, take this into account and choose wisely, it can have a huge impact.