Tuesday, April 17, 2012

SQL Temporary Tables vs. Table Variables

If you ever encountered the need to perform some calculations in a stored procedure, this one’s for you J
Today I am going to talk about the options we have to perform such calculations within side tables in the database.

Basically, we have 3 alternative ways to store such data-
Temporary Tables (#Table), Table Variables (@Table) and Permanent Tables (Table).

Usually, the most intuitive thing to do is to use a temporary table, without giving it much thought in terms of performance. I must admit that I do it myself from time to time. Perhaps the reason for that is that temporary tables have been around longer. But, are they really the best thing to use?
As you’ve probably already guessed, the answer here is not unequivocally. All 3 alternatives have their advantages and disadvantages. So let’s look into it…

Some Basic Facts:
Table variables are pretty similar to temporary tables in concept, but in spite of these similarities, there are also some notable differences between them.
Let’s have a look:

·         Temp tables are essential mostly for data manipulation, whereas table variables were designed primarily for the use of table-valued functions.
·         Temp tables can be altered with DDL statements but table variables can't (for example: You can’t create a Nonclustered Index on a table variable).
·         Transactions on table variables are usually shorter than transactions involving temp tables and require less locking.
·         SQL Server creates statistics for temp tables, which eventually helps the query optimizer to choose the relevant execution plan, as opposed to table variables. This causes less recompiles of stored procedures involving table variables.

Great! So let’s always use table variables. Seems the least expensive, doesn’t it?
Mmm… Not exactly. I once heard that when something sounds too good to be true, it probably is. As a "rule of thumb" I can say that in simple queries we would rather use table variables, whereas in long/complex ones we prefer temp tables.
In the following example I will show you why:

Example:
I’ve performed 2 tests, which count the amount of even numbers in every table and calculates the execution time.
I did this for 2 types of calculations – Simple and Joined.
These tests will be executed first on small amounts of data and then on large amounts of data, so we could see the difference more clearly.

1.  First we create our test data (1 million random values):
CREATE TABLE TestData (Id INT IDENTITY PRIMARY KEY, VALUE INT)

-- Filling TestData with 1 million random rows to work with
INSERT INTO TestData (VALUE)
SELECT TOP (1000000) ABS(CHECKSUM(NEWID()))/1000
FROM sys.all_columns CROSS JOIN sys.all_columns c1

--Creating also our permanent table on which we will be performing calculations
CREATE TABLE dbo.Test (VALUE INT)
                                          

2.  We will also create a table that will hold the final test results:
CREATE TABLE TestResults
(TestType VARCHAR(10), TableType VARCHAR(20), NumOfRows INTExecTime_ms INT)

3.  Now we create the “Simple” stored procedures.
All they do is calculate the amount of even numbers within the tables:
CREATE PROCEDURE usp_SimplePermanent
   @NumOfRows INT
AS
BEGIN
   TRUNCATE TABLE dbo.Test
   INSERT INTO dbo.Test SELECT TOP (@NumOfRows) VALUE FROM TestData
   --Calculate amount of even numbers
   SELECT COUNT(1) FROM dbo.Test WHERE (VALUE % 2 = 0)
END
-------------------------------------------------------------------------
CREATE PROCEDURE usp_SimpleTemporary
   @NumOfRows INT
AS
BEGIN
     CREATE TABLE #Test (VALUE INT)
     INSERT INTO #Test SELECT TOP (@NumOfRows) VALUE FROM TestData
     --Calculate amount of even numbers
     SELECT COUNT(1) FROM #Test WHERE (VALUE % 2 = 0)
END
-------------------------------------------------------------------------
CREATE PROCEDURE usp_SimpleTableVar
   @NumOfRows INT
AS
BEGIN
     DECLARE @Test TABLE (VALUE INT)
     INSERT INTO @Test SELECT TOP (@NumOfRows) VALUE FROM TestData
     --Calculate amount of even numbers
     SELECT COUNT(1) FROM @Test WHERE (VALUE % 2 = 0)
END

4.  In the same matter, we create the “Joined” stored procedures (same as above, but with a more complex query – This one joins between our table type and the full 1M table):
CREATE PROCEDURE usp_JoinedPermanent
   @NumOfRows INT
AS
BEGIN
     TRUNCATE TABLE dbo.Test
     INSERT INTO dbo.Test SELECT TOP (@NumOfRows) VALUE FROM TestData
     --Calculate amount of even numbers
     SELECT COUNT(1)
     FROM dbo.Test t
       INNER JOIN dbo.TestData td ON t.VALUE = td.Id --forcing a table scan
     WHERE (t.VALUE % 2 = 0)
END
-------------------------------------------------------------------------
CREATE PROCEDURE usp_JoinedTemporary   
   @NumOfRows INT
AS
BEGIN
     CREATE TABLE #Test (VALUE INT)
     INSERT INTO #Test SELECT TOP (@NumOfRows) VALUE FROM TestData
     --Calculate amount of even numbers
     SELECT COUNT(1)
     FROM #Test t
       INNER JOIN dbo.TestData td ON t.VALUE = td.Id --forcing a table scan
     WHERE (t.VALUE % 2 = 0)
END
-------------------------------------------------------------------------
CREATE PROCEDURE usp_JoinedTableVar
   @NumOfRows INT
AS
BEGIN
     DECLARE @Test TABLE (VALUE INT)
     INSERT INTO @Test SELECT TOP (@NumOfRows) VALUE FROM TestData
     --Calculate amount of even numbers
     SELECT COUNT(1)
     FROM @Test t
       INNER JOIN dbo.TestData td ON t.VALUE = td.Id --forcing a table scan
     WHERE (t.VALUE % 2 = 0)
END

5.  Now we will create the procedures that will activate our tests:
CREATE PROC usp_ExecuteSimpleTests
(
     @NumOfRows INT, -- 10 / 1000000
     @TableType VARCHAR(20) -- 'Temporary' / 'TableVar' / 'Permanent'
)     
AS
BEGIN
   DECLARE @sql NVARCHAR(2000)

   SET @sql =
   'DECLARE
          @i INT = 1,
          @n INT = 10, --run 10 iterations on every table type and use AVG results
          @Begin DATETIME2 ,
          @End DATETIME2 ,
          @TotalTime_ms  INT = 0

   WHILE (@i <= @n)
   BEGIN
          DBCC FREEPROCCACHE;
          SET @Begin = SYSDATETIME()

          EXEC usp_Simple' @TableType + ' @NumOfRows = '+CAST(@NumOfRows AS VARCHAR)

          SET @End = SYSDATETIME()
          SET @TotalTime_ms += DATEDIFF(ms, @Begin, @End)
          SET @i = @i + 1
   END

   INSERT INTO TestResults (TestType, TableType, NumOfRows, ExecTime_ms)
   SELECT ''Simple'', '''+@TableType+''', '+CAST(@NumOfRows AS VARCHAR)+
          ', @TotalTime_ms/@n'

   EXEC sp_executesql @sql
END
-------------------------------------------------------------------------
CREATE PROC usp_ExecuteJoinedTests
(
     @NumOfRows INT, -- 10 / 1000000
     @TableType VARCHAR(20) -- 'Temporary' / 'TableVar' / 'Permanent'
)     
AS
BEGIN
   DECLARE @sql NVARCHAR(2000)

   SET @sql =
   'DECLARE
          @i INT = 1,
          @n INT = 10, --run 10 iterations on every table type and use AVG results
          @Begin DATETIME2 ,
          @End DATETIME2 ,
          @TotalTime_ms  INT = 0

   WHILE (@i <= @n)
   BEGIN
          SET @Begin = SYSDATETIME()

          EXEC usp_Joined' @TableType + ' @NumOfRows = '+CAST(@NumOfRows AS VARCHAR)

          SET @End = SYSDATETIME()
          SET @TotalTime_ms += DATEDIFF(ms, @Begin, @End)
          SET @i = @i + 1
   END

   INSERT INTO TestResults (TestType, TableType, NumOfRows, ExecTime_ms)
   SELECT ''Joined'', '''+@TableType+''', '+CAST(@NumOfRows AS VARCHAR)+
          ', @TotalTime_ms/@n'

   EXEC sp_executesql @sql
END



6. Let’s execute the tests…
NOTE: For the first test we don't want to use the cache (this will affect our results). We want a "clean" execution plan for every run. Therefore, we will clean the cache before every run.
The second, obviously, will perform differently when relying on the cache (it will use the pre-calculated statistics). Therefore, we will not clean our cache here:
TRUNCATE TABLE TestResults
DBCC FREEPROCCACHE;

EXEC usp_ExecuteSimpleTests @NumOfRows = 10, @TableType = 'Permanent'
DBCC FREEPROCCACHE;
EXEC usp_ExecuteSimpleTests @NumOfRows = 10, @TableType = 'Temporary'
DBCC FREEPROCCACHE;
EXEC usp_ExecuteSimpleTests @NumOfRows = 10, @TableType = 'TableVar'
DBCC FREEPROCCACHE;
EXEC usp_ExecuteSimpleTests @NumOfRows = 1000000, @TableType = 'Permanent'
DBCC FREEPROCCACHE;
EXEC usp_ExecuteSimpleTests @NumOfRows = 1000000, @TableType = 'Temporary'
DBCC FREEPROCCACHE;
EXEC usp_ExecuteSimpleTests @NumOfRows = 1000000, @TableType = 'TableVar'
GO

EXEC usp_ExecuteJoinedTests @NumOfRows = 10, @TableType = 'Permanent'
EXEC usp_ExecuteJoinedTests @NumOfRows = 10, @TableType = 'Temporary'
EXEC usp_ExecuteJoinedTests @NumOfRows = 10, @TableType = 'TableVar'
EXEC usp_ExecuteJoinedTests @NumOfRows = 1000000, @TableType = 'Permanent'
EXEC usp_ExecuteJoinedTests @NumOfRows = 1000000, @TableType = 'Temporary'
EXEC usp_ExecuteJoinedTests @NumOfRows = 1000000, @TableType = 'TableVar'
GO

SELECT * FROM TestResults


  Results:
Simple query:

TableType
NumOfRows
ExecTime (ms)
Permanent
10
215
Temporary
10
21
TableVar
10
18
Permanent
1,000,000
6,447
Temporary
1,000,000
2,188
TableVar
1,000,000
573

It is pretty clear that the table variable has much better performance than the others.

Joined Query:

TableType
NumOfRows
ExecTime(MS)
Permanent
10
11
Temporary
10
0
TableVar
10
0
Permanent
1,000,000
6,182
Temporary
1,000,000
3,024
TableVar
1,000,000
5,644

In this case, as you can see, the temporary table is the winner.

So what is going on?
Simple Query:
Since the execution plan in this case is always the same, statistics have no meaning here and updating them creates an overhead. This is where the table variable has an advantage over the other tables.
Joined Query:
Now statistics have a huge influence on the execution plan. Since we joined our table types with the large table (1M rows), the optimizer needs statistics to figure out which is the best plan to use.

Conclusion:
According to the tests above, it is pretty clear that the permanent table is not good in any case for such calculations. It has too much of an overhead.
As stated before, the table variable seems to be a better choice for simple calculations when there is only one way to execute the procedure, and the temp table would be the best choice when the procedure can be executed in more than one way.

To conclude, there is no universal rule of when and where to use temp tables or table variables. Try them both and experiment in both sides of the spectrum – small and large amounts of records.
I, from my personal experience, would rather go for temp tables. Usually we already have the plan in cache, so we don’t have to worry about recompilation. Moreover, the advantage the table variable has over the temp table in simple calculations is minor compared to the advantage the temporary table has for complex ones.

Good Luck! J

2 comments:

  1. wow nice example

    ReplyDelete
  2. Basic rule of thumb as per this blog:
    http://blogs.davelozinski.com/curiousconsultant/optimizing-sql-strategies
    When using small amounts of data (eg, a few rows), table variables are great! Otherwise, #temp tables are the way to go.

    ReplyDelete