If you ever encountered
the need to perform some calculations in a stored procedure, this one’s
for you J
Today I am going to talk
about the options we have to perform such calculations within side tables in
the database.
Basically, we have 3
alternative ways to store such data-
Temporary Tables (#Table),
Table Variables (@Table) and Permanent Tables (Table).
Usually, the most intuitive
thing to do is to use a temporary table, without giving it much thought in
terms of performance. I must admit that I do it myself from time to time.
Perhaps the reason for that is that temporary tables have been around longer.
But, are they really the best thing to use?
As you’ve probably
already guessed, the answer here is not unequivocally. All 3 alternatives have
their advantages and disadvantages. So let’s look into it…
Some Basic Facts:
Table variables are
pretty similar to temporary tables in concept, but in spite of these
similarities, there are also some notable differences between them.
Let’s have a look:
Let’s have a look:
·
Temp tables are
essential mostly for data manipulation, whereas table variables were
designed primarily for the use of table-valued functions.
·
Temp tables can
be altered with DDL statements but table variables can't (for
example: You can’t create a Nonclustered Index on a table variable).
·
Transactions on table variables are usually shorter than
transactions involving temp tables and require less locking.
·
SQL Server creates statistics for temp tables, which
eventually helps the query optimizer to choose the relevant execution plan, as
opposed to table variables. This causes less recompiles of
stored procedures involving table variables.
Great! So let’s always
use table variables. Seems the least expensive, doesn’t it?
Mmm… Not exactly. I
once heard that when something sounds too good to be true, it probably is. As a
"rule of thumb" I can say that in simple queries we would rather
use table variables, whereas in long/complex ones we prefer temp tables.
In the following
example I will show you why:
Example:
I’ve performed 2 tests,
which count the amount of even numbers in every table and calculates the
execution time.
I did this for 2 types
of calculations – Simple and Joined.
These tests will be executed
first on small amounts of data and then on large amounts of data, so we could
see the difference more clearly.
1. First
we create our test data (1 million random values):
CREATE TABLE TestData (Id INT IDENTITY PRIMARY KEY, VALUE INT)
-- Filling TestData with 1 million random rows
to work with
INSERT INTO TestData (VALUE)
SELECT TOP (1000000) ABS(CHECKSUM(NEWID()))/1000
FROM sys.all_columns c CROSS JOIN sys.all_columns c1
--Creating also our permanent table on which we
will be performing calculations
CREATE TABLE dbo.Test (VALUE INT)
2. We will
also create a table that will hold the final test results:
CREATE TABLE TestResults
(TestType VARCHAR(10), TableType VARCHAR(20), NumOfRows INT, ExecTime_ms INT)
3. Now we
create the “Simple” stored procedures.
All they do is calculate
the amount of even numbers within the tables:
CREATE PROCEDURE usp_SimplePermanent
@NumOfRows INT
AS
BEGIN
TRUNCATE TABLE dbo.Test
INSERT INTO dbo.Test SELECT TOP (@NumOfRows) VALUE FROM TestData
--Calculate amount of even numbers
SELECT COUNT(1) FROM dbo.Test WHERE (VALUE % 2 = 0)
END
-------------------------------------------------------------------------
CREATE PROCEDURE usp_SimpleTemporary
@NumOfRows INT
AS
BEGIN
CREATE TABLE #Test (VALUE INT)
INSERT INTO #Test SELECT TOP (@NumOfRows) VALUE FROM TestData
--Calculate amount of even numbers
SELECT COUNT(1) FROM #Test WHERE (VALUE % 2 = 0)
END
-------------------------------------------------------------------------
CREATE PROCEDURE usp_SimpleTableVar
@NumOfRows INT
AS
BEGIN
DECLARE @Test TABLE (VALUE INT)
INSERT INTO @Test SELECT TOP (@NumOfRows) VALUE FROM TestData
--Calculate amount of even numbers
SELECT COUNT(1) FROM @Test WHERE (VALUE % 2 = 0)
END
4. In the
same matter, we create the “Joined” stored procedures (same as above, but with
a more complex query – This one joins between our table type and the full 1M
table):
CREATE PROCEDURE usp_JoinedPermanent
@NumOfRows INT
AS
BEGIN
TRUNCATE TABLE dbo.Test
INSERT INTO dbo.Test SELECT TOP (@NumOfRows) VALUE FROM TestData
--Calculate amount of even numbers
SELECT COUNT(1)
FROM dbo.Test t
INNER JOIN dbo.TestData td ON t.VALUE = td.Id --forcing a table scan
WHERE (t.VALUE % 2 = 0)
END
-------------------------------------------------------------------------
CREATE PROCEDURE usp_JoinedTemporary
@NumOfRows INT
AS
BEGIN
CREATE TABLE #Test (VALUE INT)
INSERT INTO #Test SELECT TOP (@NumOfRows) VALUE FROM TestData
--Calculate amount of even numbers
SELECT COUNT(1)
FROM #Test t
INNER JOIN dbo.TestData td ON t.VALUE = td.Id --forcing a table scan
WHERE (t.VALUE % 2 = 0)
END
-------------------------------------------------------------------------
CREATE PROCEDURE usp_JoinedTableVar
@NumOfRows INT
AS
BEGIN
DECLARE @Test TABLE (VALUE INT)
INSERT INTO @Test SELECT TOP (@NumOfRows) VALUE FROM TestData
--Calculate amount of even numbers
SELECT COUNT(1)
FROM @Test t
INNER JOIN dbo.TestData td ON t.VALUE = td.Id --forcing a table scan
WHERE (t.VALUE % 2 = 0)
END
5. Now we
will create the procedures that will activate our tests:
CREATE PROC usp_ExecuteSimpleTests
(
@NumOfRows INT, -- 10 / 1000000
@TableType VARCHAR(20) -- 'Temporary' /
'TableVar' / 'Permanent'
)
AS
BEGIN
DECLARE @sql NVARCHAR(2000)
SET @sql =
'DECLARE
@i INT = 1,
@n INT = 10, --run 10 iterations on
every table type and use AVG results
@Begin DATETIME2 ,
@End DATETIME2 ,
@TotalTime_ms INT = 0
WHILE (@i <= @n)
BEGIN
DBCC FREEPROCCACHE;
SET @Begin = SYSDATETIME()
EXEC usp_Simple' + @TableType + ' @NumOfRows = '+CAST(@NumOfRows AS VARCHAR)
SET @End = SYSDATETIME()
SET @TotalTime_ms += DATEDIFF(ms, @Begin, @End)
SET @i = @i + 1
END
INSERT INTO TestResults (TestType,
TableType, NumOfRows, ExecTime_ms)
SELECT ''Simple'', '''+@TableType+''', '+CAST(@NumOfRows AS VARCHAR)+
', @TotalTime_ms/@n'
', @TotalTime_ms/@n'
EXEC sp_executesql @sql
END
-------------------------------------------------------------------------
CREATE PROC usp_ExecuteJoinedTests
(
@NumOfRows INT, -- 10 / 1000000
@TableType VARCHAR(20) -- 'Temporary' /
'TableVar' / 'Permanent'
)
AS
BEGIN
DECLARE @sql NVARCHAR(2000)
SET @sql =
'DECLARE
@i INT = 1,
@n INT = 10, --run 10 iterations on
every table type and use AVG results
@Begin DATETIME2 ,
@End DATETIME2 ,
@TotalTime_ms INT = 0
WHILE (@i <= @n)
BEGIN
SET @Begin = SYSDATETIME()
EXEC usp_Joined' + @TableType + ' @NumOfRows = '+CAST(@NumOfRows AS VARCHAR)
SET @End = SYSDATETIME()
SET @TotalTime_ms += DATEDIFF(ms, @Begin, @End)
SET @i = @i + 1
END
INSERT INTO TestResults (TestType,
TableType, NumOfRows, ExecTime_ms)
SELECT ''Joined'', '''+@TableType+''', '+CAST(@NumOfRows AS VARCHAR)+
', @TotalTime_ms/@n'
', @TotalTime_ms/@n'
EXEC sp_executesql @sql
END
6. Let’s execute
the tests…
NOTE: For the first test
we don't want to use the cache (this will affect our results). We want a
"clean" execution plan for every run. Therefore, we will clean the
cache before every run.
The second, obviously,
will perform differently when relying on the cache (it will use the
pre-calculated statistics). Therefore, we will not clean our cache here:
TRUNCATE TABLE TestResults
TRUNCATE TABLE TestResults
DBCC FREEPROCCACHE;
EXEC usp_ExecuteSimpleTests @NumOfRows = 10, @TableType = 'Permanent'
DBCC FREEPROCCACHE;
EXEC usp_ExecuteSimpleTests @NumOfRows = 10, @TableType = 'Temporary'
DBCC FREEPROCCACHE;
EXEC usp_ExecuteSimpleTests @NumOfRows = 10, @TableType = 'TableVar'
DBCC FREEPROCCACHE;
EXEC usp_ExecuteSimpleTests @NumOfRows = 1000000, @TableType = 'Permanent'
DBCC FREEPROCCACHE;
EXEC usp_ExecuteSimpleTests @NumOfRows = 1000000, @TableType = 'Temporary'
DBCC FREEPROCCACHE;
EXEC usp_ExecuteSimpleTests @NumOfRows = 1000000, @TableType = 'TableVar'
GO
EXEC usp_ExecuteJoinedTests @NumOfRows = 10, @TableType = 'Permanent'
EXEC usp_ExecuteJoinedTests @NumOfRows = 10, @TableType = 'Temporary'
EXEC usp_ExecuteJoinedTests @NumOfRows = 10, @TableType = 'TableVar'
EXEC usp_ExecuteJoinedTests @NumOfRows = 1000000, @TableType = 'Permanent'
EXEC usp_ExecuteJoinedTests @NumOfRows = 1000000, @TableType = 'Temporary'
EXEC usp_ExecuteJoinedTests @NumOfRows = 1000000, @TableType = 'TableVar'
GO
SELECT * FROM TestResults
Results:
Simple query:
TableType
|
NumOfRows
|
ExecTime (ms)
|
Permanent
|
10
|
215
|
Temporary
|
10
|
21
|
TableVar
|
10
|
18
|
Permanent
|
1,000,000
|
6,447
|
Temporary
|
1,000,000
|
2,188
|
TableVar
|
1,000,000
|
573
|
It is pretty clear that the table variable has much better performance than the others.
Joined Query:
TableType
|
NumOfRows
|
ExecTime(MS)
|
Permanent
|
10
|
11
|
Temporary
|
10
|
0
|
TableVar
|
10
|
0
|
Permanent
|
1,000,000
|
6,182
|
Temporary
|
1,000,000
|
3,024
|
TableVar
|
1,000,000
|
5,644
|
In this case, as you can see, the temporary table is the winner.
So what is going on?
Simple Query:
Since the execution plan
in this case is always the same, statistics have no meaning here and updating
them creates an overhead. This is where the table variable has an advantage
over the other tables.
Joined Query:
Now statistics have a
huge influence on the execution plan. Since we joined our table types with the
large table (1M rows), the optimizer needs statistics to figure out which is
the best plan to use.
Conclusion:
According to the tests
above, it is pretty clear that the permanent table is
not good in any case for such calculations. It has too much
of an overhead.
As stated before,
the table variable seems to be a better choice for simple
calculations when there is only one way to execute the procedure,
and the temp table would be the best choice when the procedure
can be executed in more than one way.
To conclude, there is no
universal rule of when and where to
use temp tables or table variables. Try them both and
experiment in both sides of the spectrum – small and large amounts of records.
I, from my personal
experience, would rather go for temp tables. Usually we already have the plan
in cache, so we don’t have to worry about recompilation. Moreover, the
advantage the table variable has over the temp table in simple calculations is
minor compared to the advantage the temporary table has for complex ones.
Good Luck! J
wow nice example
ReplyDeleteBasic rule of thumb as per this blog:
ReplyDeletehttp://blogs.davelozinski.com/curiousconsultant/optimizing-sql-strategies
When using small amounts of data (eg, a few rows), table variables are great! Otherwise, #temp tables are the way to go.