Wednesday, March 11, 2015

Temp Tables vs Table Variables -- A Classic Just Became Clearer

Temp Tables vs Table Variables 



If you Google “Temp Tables vs Table Variables” you will come up with a myriad of results all of which contain excellent information.   Some examples are:
SQL Server Planet – gives a fantastic matrix of the various things that can be accomplished through each option
Stackoverflow - Rory and Stackoverflow - Martin Smith – each of which gives you insight into the different advantages and disadvantages of each.  Things like scoping, transactional behavior, indexing, recompilation, locking and statistics.   This is a tremendous amount of information and can really help you learn the internals of the two options.  
Code Project – gives a nice little run through of the performance of each type with time to complete for the varying record set lengths
Pretty much anything that you could want to know about temp tables and table variables you can find in one of these links.  
What I want to discuss here is why it makes such a big difference on these larger data sets as each of the above links clearly states is where you do not want to use table variables.    So below I have a very simple set of queries that will demonstrate what I see as the biggest issue with larger datasets being used with table variables. 
First I will create my table variable and my temp table with the same dataset.
Use AdventureWorks2012;
GO
Declare @tbl table (SalesOrderID int
                              , CarrierTrackingNumber varchar(40)
                              , OrderQty int
                              , ProductId int
                              , UnitPrice float);

insert into @tbl
select SalesOrderID
, CarrierTrackingNumber
, OrderQty
, ProductID
, UnitPrice
from [Sales].[SalesOrderDetail];

if object_id('tempdb..#t'is not null
      drop table #t;
Create table #t
      (SalesOrderID int
      , CarrierTrackingNumber varchar(40)
      , OrderQty int
      , ProductId int
      , UnitPrice float);

insert into #t
select SalesOrderID
, CarrierTrackingNumber
, OrderQty
, ProductID
, UnitPrice
from [Sales].[SalesOrderDetail];


And from there I will select out the records, doing a simple join over to a second table that has a direct reference within the temp table or table variable.

Select t.SalesOrderID, soh.AccountNumber, soh.BillToAddressID, soh.OrderDate
from @tbl t
join Sales.SalesOrderHeader  soh on t.SalesOrderID = soh.SalesOrderID;

Select t.SalesOrderID, soh.AccountNumber, soh.BillToAddressID, soh.OrderDate
from #t t
join Sales.SalesOrderHeader  soh on t.SalesOrderID = soh.SalesOrderID;

Using SQL Sentry Plan Explorer we can see that first off the Estimated cost for the Select against the table variable is less than that of the select against the Temp table.  In my case, this is against a table of 120,000 rows.




Don’t listen to that, because it is not always accurate.  It is just an estimate even if you have the actual execution plan.
So in looking at the Query Plan’s we see that for the table variable



This looks like a pretty good plan, right?   Clustered index seek on the real table, nested loops to bring it together…nothing alarming about this at first glance.
Heck, compare it to the plan for the temp table


And it makes it almost seem better.  This has a table scan against the real table, a hash match and it even has that darn warning (in this case that is just for a missing index suggestion but at first glance you might think this is worse).
HOWEVER, this is actually the problem.   The estimated cost makes it seem better, the plan even kind of looks better at initial glance but the reality of it is that this is a  MUCH WORSE plan.    And why does it do this?



Because that table variable  ALWAYS has an estimated row count of 1 row if you do not put optimizer hints on the query.  And because of that, the optimizer will assume that doing index seeks instead of scans is a much better approach. In a case like this one where we have over 100,000 rows, that is most assuredly NOT the case.
That Index seek with the nested loops to bring it together ends up doing over 360,000 reads in total, where the table scan with the hash match ends and the “warning” only does 1,246.



We can fairly easily remedy this situation by putting a simple Option (Recompile) at the end of our query that accesses that table variable as you can see below.   And when that is done, the query plan that the optimizer creates ends up being the exact same as that of the Temp Table. 
Select t.SalesOrderID, soh.AccountNumber, soh.BillToAddressID, soh.OrderDate
from @tbl t
join Sales.SalesOrderHeader  soh on t.SalesOrderID = soh.SalesOrderID
option (recompile);
This is a good solution but it is one that comes at the expense of expecting everyone who writes these queries to know this fact and how to get around it.  There are apparently other ways that can make a Table Variable do essentially the same thing as Option (Recompile) as is talked about at the bottom of the SQL Server Central article Mythbusting - Table variables have no statistics by the writer and Brent Ozar, but I think in the end we have to choose what the odds of us and our teams remembering to either put that optimizer hint on a query or actually know what the various outlier scenarios are that cause table variables to work in this way.

Table variables have their place but that estimated number of rows being equal to 1 without additional help to the optimizer can definitely have an impact on the types of query plans the optimizer will choose.    So when you are making that decision, take this into account and choose wisely, it can have a huge impact.